--- layout: page title: Data Visualisation (ggplot) permalink: /Rcoding/coding2/ parent: R Coding nav_order: 2 ---
Alongside the plotting functions we learnt about in script 2, there
is a language to create prettier and more elegant data visualisations
than base R: ggplot2, the grammar of graphics. You
can learn all about it on this website,
this
free online course or this Youtube
webinar.
To use ggplot2, you are going to need to install the
tidyverse collection of packages, which includes, alongside
ggplot2, a host of other packages with functions that are
widely used to wrangle with, model and visualise data in
R. As these functions simply speed up things you can
still do in base R - but usually with longer lines of
code. To keep it simple and avoid confusion we are not going to cover it
much in this course. However, if you feel like you’ve grasped the basics
of data visualisation, you may want to try your hand to reproduce the
graphs from script 2’s with ggplot2 instead of base
R. The code to do so is below and gives you a sense for the
syntax of ggplot.
#install.packages("tidyverse") # installs the package you need
library(tidyverse) #loads the package
#load the data
qog <- read.csv("qog.csv")
qog2 <- data.frame(table(region = qog$region)) #creates a dataframe of frequencies
ggplot(data = qog2, mapping = aes(x = region, y = Freq)) +
geom_bar(stat = "identity") + #makes the barplot
theme_minimal() + #removes ugly grey background
theme(axis.text.x = element_text(angle = 90, vjust = 1)) + #rotates the x axis text
ylab("Count") + #creates the y axis label
ggtitle("Distribution of countries by region") #creates the title
ggplot(data = qog2, mapping = aes(x = region, y = Freq)) +
geom_bar(stat = "identity") +
theme_minimal() +
theme(axis.text.x = element_text(angle = 90, vjust = 1)) +
xlab("Count") +
ggtitle("Distribution of countries by region") +
coord_flip() #flips x and y axes
qog3 <- data.frame(table(Status = qog$freedomhouse_status)) #creates a dataframe of frequencies
ggplot(data = qog3, mapping = aes(x = Status, y = Freq, fill = Status)) +
geom_bar(stat = "identity") +
theme_minimal() + ylab("Frequency") + xlab("Freedom House Status") +
ggtitle("Barplot")
ggplot(data = qog, mapping = aes(x = freedomhouse_status,
y = human_devt_index,
col = freedomhouse_status,
fill = freedomhouse_status)) +
geom_boxplot(alpha = 0.5) + theme_minimal() +
xlab("Freedom House Rating") + ylab("Human Development Index") +
theme(legend.position = "none") + ggtitle("Barplot")
ggplot(data = qog, mapping = aes(x = human_devt_index)) +
geom_histogram(binwidth = 0.05, fill = "grey",
col = "black", alpha = 0.2) +
#alpha (0 to 1) makes the fill more or less transparent
theme_minimal() +
xlab("UNDP Human Development Indicator") + ylab("Count") +
ggtitle("Distribution of HDI") +
theme(plot.title = element_text(hjust = 0.5)) +
#centers the plot title
geom_vline(xintercept = median(qog$human_devt_index, na.rm=TRUE), col = "red") +
geom_vline(xintercept = mean(qog$human_devt_index, na.rm=TRUE), col = "blue") +
geom_vline(xintercept = mean(qog$human_devt_index, na.rm=TRUE) +
sd(qog$human_devt_index, na.rm=TRUE), col = "blue", lty = 3) +
geom_vline(xintercept = mean(qog$human_devt_index, na.rm=TRUE) -
sd(qog$human_devt_index, na.rm=TRUE), col = "blue", lty = 3)
You can plot a histogram by groups using the color/fill
argument.
Note that the bars will be stacked - you can use the
position="identity" or `position="dodge"
arguments in the geom_histogram() command to avoid this but
these other approaches are not ideal if you have more than two groups -
as bars by group will be plotted next to each other.
ggplot(data = qog, mapping = aes(x = human_devt_index, fill= freedomhouse_status)) +
geom_histogram(binwidth = 0.05, alpha = 0.6) +
theme_minimal() +
xlab("UNDP Human Development Indicator") + ylab("Count") +
ggtitle("Distribution of HDI by Freedomhouse") +
theme(plot.title = element_text(hjust = 0.5)) +
#centers the plot title
geom_vline(xintercept = mean(qog$human_devt_index[qog$freedomhouse_status=="Free"], na.rm=TRUE), col = "red") +
geom_vline(xintercept = mean(qog$human_devt_index[qog$freedomhouse_status=="Partly Free"], na.rm=TRUE), col = "blue") +
geom_vline(xintercept = mean(qog$human_devt_index[qog$freedomhouse_status=="Not Free"], na.rm=TRUE), col="green", lty=2)
ggplot(data = qog, mapping = aes(x = human_devt_index)) +
geom_density(bw = 0.025, fill = "lightblue", alpha = 0.2) +
#bw sets the bandwidth of the density plot
theme_minimal() +
xlab("UNDP Human Development Indicator") + ylab("Density") +
ggtitle("Distribution of HDI") +
theme(plot.title = element_text(hjust = 0.5)) + ggtitle("Density Plot")
ggplot(data = qog, mapping = aes(x = polity,
y = fragile_state_index)) +
geom_point() + theme_minimal() +
scale_color_manual(values = c("black", "red")) +
geom_point(data = subset(qog, country %in% c("Italy", "Greece")), col = "red",
shape = 17, size = 3) +
ylab("Fragile State Index") + xlab("Polity") + ggtitle("Scatterplot")
ggplot(data = qog, mapping = aes(x = polity,
y = fragile_state_index, label = iso3c, col = freedomhouse_status)) +
geom_text(size = 3) + theme_minimal() +
scale_color_manual(values = c("royalblue", "tomato", "violet"),
labels = c("Free", "Not Free", "Partly Free"),
name = "Freedom House Rating") +
ylab("Fragile State Index") +
xlab("Polity") + ggtitle("Scatterplot with Text Labels")